{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Introduction to atomman: table conversions\n",
    "\n",
    "__Lucas M. Hale__, [lucas.hale@nist.gov](mailto:lucas.hale@nist.gov?Subject=ipr-demo), _Materials Science and Engineering Division, NIST_.\n",
    "    \n",
    "[Disclaimers](http://www.nist.gov/public_affairs/disclaimer.cfm) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Introduction<a id='section1'></a>\n",
    "\n",
    "Atomic configuration data is often expressed in a tabular form as it is an efficient way of representing per-atom properties.  Many commn formats, such as the xyz format, are little more than this.  In atomman, the table load/dump style provides a means of converting to/from a generic table format.  As such, the table-style converter can be used to help with defining converters from more specific tabular formats."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Library Imports**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "atomman version = 1.4.0\n",
      "Notebook executed on 2021-08-04\n"
     ]
    }
   ],
   "source": [
    "# Standard Python libraries\n",
    "import datetime\n",
    "\n",
    "# http://www.numpy.org/\n",
    "import numpy as np\n",
    "\n",
    "import atomman as am\n",
    "import atomman.unitconvert as uc\n",
    "\n",
    "# Show atomman version\n",
    "print('atomman version =', am.__version__)\n",
    "\n",
    "# Show date of Notebook execution\n",
    "print('Notebook executed on', datetime.date.today())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Generate test system information (CsCl)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "avect =  [ 3.200,  0.000,  0.000]\n",
      "bvect =  [ 0.000,  3.200,  0.000]\n",
      "cvect =  [ 0.000,  0.000,  3.200]\n",
      "origin = [ 0.000,  0.000,  0.000]\n",
      "natoms = 2\n",
      "natypes = 2\n",
      "symbols = ('Cs', 'Cl')\n",
      "pbc = [ True  True  True]\n",
      "per-atom properties = ['atype', 'pos', 'charge', 'stress']\n",
      "     id |   atype |  pos[0] |  pos[1] |  pos[2]\n",
      "      0 |       1 |   0.000 |   0.000 |   0.000\n",
      "      1 |       2 |   1.600 |   1.600 |   1.600\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>atype</th>\n",
       "      <th>pos[0]</th>\n",
       "      <th>pos[1]</th>\n",
       "      <th>pos[2]</th>\n",
       "      <th>charge</th>\n",
       "      <th>stress[0][0]</th>\n",
       "      <th>stress[0][1]</th>\n",
       "      <th>stress[0][2]</th>\n",
       "      <th>stress[1][0]</th>\n",
       "      <th>stress[1][1]</th>\n",
       "      <th>stress[1][2]</th>\n",
       "      <th>stress[2][0]</th>\n",
       "      <th>stress[2][1]</th>\n",
       "      <th>stress[2][2]</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>1.6</td>\n",
       "      <td>1.6</td>\n",
       "      <td>1.6</td>\n",
       "      <td>-1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   atype  pos[0]  pos[1]  pos[2]  charge  stress[0][0]  stress[0][1]  \\\n",
       "0      1     0.0     0.0     0.0     1.0           0.0           0.0   \n",
       "1      2     1.6     1.6     1.6    -1.0           0.0           0.0   \n",
       "\n",
       "   stress[0][2]  stress[1][0]  stress[1][1]  stress[1][2]  stress[2][0]  \\\n",
       "0           0.0           0.0           0.0           0.0           0.0   \n",
       "1           0.0           0.0           0.0           0.0           0.0   \n",
       "\n",
       "   stress[2][1]  stress[2][2]  \n",
       "0           0.0           0.0  \n",
       "1           0.0           0.0  "
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Generate box\n",
    "alat = uc.set_in_units(3.2, 'angstrom')\n",
    "box = am.Box(a=alat, b=alat, c=alat)\n",
    "\n",
    "# Generate atoms with atype, pos, charge, and stress properties\n",
    "atype = [1, 2]\n",
    "pos = [[0,0,0], [0.5, 0.5, 0.5]]\n",
    "charge = uc.set_in_units([1, -1], 'e')\n",
    "stress = uc.set_in_units(np.zeros((2, 3, 3)), 'MPa')\n",
    "atoms = am.Atoms(pos=pos, atype=atype, charge=charge, stress=stress)\n",
    "\n",
    "# Build system from box and atoms, and scale atoms\n",
    "system = am.System(atoms=atoms, box=box, scale=True, symbols=['Cs', 'Cl'])\n",
    "\n",
    "# Print system information\n",
    "print(system)\n",
    "system.atoms_df()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. System.dump('table')<a id='section2'></a>\n",
    "\n",
    "Parameters\n",
    "\n",
    "- **f** (*str or file-like object, optional*) File path or file-like object to write the table to.  If not given, then the table is returned as a string.\n",
    "\n",
    "- **prop_name** (*list, optional*) The Atoms properties to include.  Must be given if prop_info is not.\n",
    "\n",
    "- **table_name** (*list, optional*) The table column name(s) that correspond to each prop_name.  If not given, the table_name values will be based on the prop_name values.\n",
    "\n",
    "- **shape** (*list, optional*) The shape of each per-atom property.  If not given, will be inferred from the length of each table_name value.\n",
    "\n",
    "- **unit** (*list, optional*) Lists the units for each prop_name as stored in the table.  For a value of None, no conversion will be performed for that property.  For a value of 'scaled', the corresponding table values will be taken in box-scaled units.  If not given, all unit values will be set to None (i.e. no conversions).\n",
    "\n",
    "- **dtype** (*list, optional*) Allows for the data type of each property to be explicitly given. Values of None will infer the data type from the corresponding property values.  If not given, all values will be None.\n",
    "\n",
    "- **prop_info** (*list of dict, optional*) Structured form of property conversion parameters, in which each dictionary in the list corresponds to a single atoms property.  Each dictionary must have a 'prop_name' field, and can optionally have 'table_name', 'shape', 'unit', and 'dtype' fields.\n",
    "\n",
    "- **header** (*bool, optional*) Flag indicating whether to include the column names in the outputted table.  Default value is False (no column names).\n",
    "\n",
    "- **float_format** (*str, optional*) c-style formatting string for floating point values.  Default value is '%.13f'.\n",
    "\n",
    "- **return_prop_info** (*bool, optional*) Flag indicating if the filled-in prop_info is to be returned.  Having this allows for 1:1 load/dump conversions.  Default value is False (prop_info is not returned).\n",
    "\n",
    "- **extra** (*dict, optional*)  Allows extra per-atom data that is not part of the System to be included in the generated table.  Useful when the per-atom data only has meaning in the tabular format and should not be added to System.\n",
    "\n",
    "Returns\n",
    "\n",
    "- (*str*) The generated data table.  Only returned if fp is None."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Dump data without a header"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1 0.00000 0.00000 0.00000 1.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000\n",
      "2 1.60000 1.60000 1.60000 -1.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000\n",
      "\n"
     ]
    }
   ],
   "source": [
    "table = system.dump('table', float_format='%.5f')\n",
    "print(table)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Dump data with a header"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "atype pos[0] pos[1] pos[2] charge stress[0][0] stress[0][1] stress[0][2] stress[1][0] stress[1][1] stress[1][2] stress[2][0] stress[2][1] stress[2][2]\n",
      "1 0.00000 0.00000 0.00000 1.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000\n",
      "2 1.60000 1.60000 1.60000 -1.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000\n",
      "\n"
     ]
    }
   ],
   "source": [
    "table = system.dump('table', float_format='%.5f', header=True)\n",
    "print(table)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Setting return_prop_info=True will also return a list of dictionaries that provides the transformation mapping between the table format and the scalar/vector/tensor per-atom properties."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "atype pos[0] pos[1] pos[2] charge stress[0][0] stress[0][1] stress[0][2] stress[1][0] stress[1][1] stress[1][2] stress[2][0] stress[2][1] stress[2][2]\n",
      "1 0.00000 0.00000 0.00000 1.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000\n",
      "2 1.60000 1.60000 1.60000 -1.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000\n",
      "\n"
     ]
    }
   ],
   "source": [
    "table, prop_info = system.dump('table', float_format='%.5f', header=True, return_prop_info=True)\n",
    "print(table)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'prop_name': 'atype', 'shape': (), 'table_name': ['atype'], 'unit': None, 'dtype': None}\n",
      "{'prop_name': 'pos', 'shape': (3,), 'table_name': ['pos[0]', 'pos[1]', 'pos[2]'], 'unit': None, 'dtype': None}\n",
      "{'prop_name': 'charge', 'shape': (), 'table_name': ['charge'], 'unit': None, 'dtype': None}\n",
      "{'prop_name': 'stress', 'shape': (3, 3), 'table_name': ['stress[0][0]', 'stress[0][1]', 'stress[0][2]', 'stress[1][0]', 'stress[1][1]', 'stress[1][2]', 'stress[2][0]', 'stress[2][1]', 'stress[2][2]'], 'unit': None, 'dtype': None}\n"
     ]
    }
   ],
   "source": [
    "# Show the generated prop_info conversion info\n",
    "for pinfo in prop_info:\n",
    "    print(pinfo)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. atomman.load('table')<a id='section3'></a>\n",
    "\n",
    "Parameters\n",
    "\n",
    "- **table** (*str or file-like object*) The table content, file path or file-like object containing the content to read.\n",
    "\n",
    "- **box** (*atomman.Box*) The atomic box to use when generating a System around the data.\n",
    "\n",
    "- **symbols** (*tuple, optional*) Allows the list of element symbols to be assigned during loading.\n",
    "\n",
    "- **system** (*atomman.System, optional*) The atomic system to load the values to.  If not given, a new system will be constructed.\n",
    "\n",
    "- **prop_name** (*list, optional*) The Atoms properties to generate.  Must be given if prop_info is not.\n",
    "\n",
    "- **table_name** (*list, optional*) The table column name(s) that correspond to each prop_name.  If not given, the table_name values will be based on the prop_name values.\n",
    "\n",
    "- **shape** (*list, optional*) The shape of each per-atom property.  If not given, will be inferred from the length of each table_name value.\n",
    "\n",
    "- **unit** (*list, optional*) Lists the units for each prop_name as stored in the table.  For a value of None, no conversion will be performed for that property.  For a value of 'scaled', the corresponding table values will be taken in box-scaled units.  If not given, all unit values will be set to None (i.e. no conversions).\n",
    "\n",
    "- **dtype** (*list, optional*) Allows for the data type of each property to be explicitly given.  Values of None will infer the data type from the corresponding property values.  If not given, all values will be None.\n",
    "\n",
    "- **prop_info** (*list of dict, optional*) Structured form of property conversion parameters, in which each dictionary in the list corresponds to a single atoms property.  Each dictionary must have a 'prop_name' field, and can optionally have 'table_name', 'shape', 'unit', and 'dtype' fields.\n",
    "\n",
    "- **skiprows** (*int, optional*) Number of rows to skip before reading the data.\n",
    "\n",
    "- **usecols** (*int, optional*) Which columns are to be used. Will be passed to pandas.read_csv() usecols option.\n",
    "\n",
    "- **nrows** (*int, optional*) Number of rows of data to read.\n",
    "\n",
    "- **comment** (*str, optional*) Specifies a character which indicates all text on a given line after is to be considered to be a comment and ignored by parser. This is often '#'.\n",
    "\n",
    "Returns\n",
    "\n",
    "- (*atomman.System*) The generated system."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "atomman.load('table') has a large number of parameters to allow many varieties of tabular data to be interpreted.  Here, parameters are selected to read in the table generated above to reproduce the original system."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "atype pos[0] pos[1] pos[2] charge stress[0][0] stress[0][1] stress[0][2] stress[1][0] stress[1][1] stress[1][2] stress[2][0] stress[2][1] stress[2][2]\n",
      "1 0.00000 0.00000 0.00000 1.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000\n",
      "2 1.60000 1.60000 1.60000 -1.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000\n",
      "\n",
      "\n",
      "[{'prop_name': 'atype', 'shape': (), 'table_name': ['atype'], 'unit': None, 'dtype': None}, {'prop_name': 'pos', 'shape': (3,), 'table_name': ['pos[0]', 'pos[1]', 'pos[2]'], 'unit': None, 'dtype': None}, {'prop_name': 'charge', 'shape': (), 'table_name': ['charge'], 'unit': None, 'dtype': None}, {'prop_name': 'stress', 'shape': (3, 3), 'table_name': ['stress[0][0]', 'stress[0][1]', 'stress[0][2]', 'stress[1][0]', 'stress[1][1]', 'stress[1][2]', 'stress[2][0]', 'stress[2][1]', 'stress[2][2]'], 'unit': None, 'dtype': None}]\n"
     ]
    }
   ],
   "source": [
    "# Show table and prop_info again\n",
    "print(table)\n",
    "print()\n",
    "print(prop_info)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "avect =  [ 3.200,  0.000,  0.000]\n",
      "bvect =  [ 0.000,  3.200,  0.000]\n",
      "cvect =  [ 0.000,  0.000,  3.200]\n",
      "origin = [ 0.000,  0.000,  0.000]\n",
      "natoms = 2\n",
      "natypes = 2\n",
      "symbols = ('Cs', 'Cl')\n",
      "pbc = [ True  True  True]\n",
      "per-atom properties = ['atype', 'pos', 'charge', 'stress']\n",
      "     id |   atype |  pos[0] |  pos[1] |  pos[2]\n",
      "      0 |       1 |   0.000 |   0.000 |   0.000\n",
      "      1 |       2 |   1.600 |   1.600 |   1.600\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>atype</th>\n",
       "      <th>pos[0]</th>\n",
       "      <th>pos[1]</th>\n",
       "      <th>pos[2]</th>\n",
       "      <th>charge</th>\n",
       "      <th>stress[0][0]</th>\n",
       "      <th>stress[0][1]</th>\n",
       "      <th>stress[0][2]</th>\n",
       "      <th>stress[1][0]</th>\n",
       "      <th>stress[1][1]</th>\n",
       "      <th>stress[1][2]</th>\n",
       "      <th>stress[2][0]</th>\n",
       "      <th>stress[2][1]</th>\n",
       "      <th>stress[2][2]</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>1.6</td>\n",
       "      <td>1.6</td>\n",
       "      <td>1.6</td>\n",
       "      <td>-1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   atype  pos[0]  pos[1]  pos[2]  charge  stress[0][0]  stress[0][1]  \\\n",
       "0      1     0.0     0.0     0.0     1.0           0.0           0.0   \n",
       "1      2     1.6     1.6     1.6    -1.0           0.0           0.0   \n",
       "\n",
       "   stress[0][2]  stress[1][0]  stress[1][1]  stress[1][2]  stress[2][0]  \\\n",
       "0           0.0           0.0           0.0           0.0           0.0   \n",
       "1           0.0           0.0           0.0           0.0           0.0   \n",
       "\n",
       "   stress[2][1]  stress[2][2]  \n",
       "0           0.0           0.0  \n",
       "1           0.0           0.0  "
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Pass table with box and prop_info\n",
    "table_system = am.load('table', table, box=system.box, symbols=['Cs', 'Cl'],\n",
    "                       skiprows=1, prop_info=prop_info)\n",
    "\n",
    "print(table_system)\n",
    "table_system.atoms_df()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[0. 0. 0.]\n",
      " [0. 0. 0.]\n",
      " [0. 0. 0.]]\n"
     ]
    }
   ],
   "source": [
    "# Using the generated prop_info from dump makes the resulting properties have the right shape.\n",
    "print(table_system.atoms.stress[0])"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
